6 research outputs found

    Open source software GitHub ecosystem: a SEM approach

    Get PDF
    Open source software (OSS) is a collaborative effort. Getting affordable high-quality software with less probability of errors or fails is not far away. Thousands of open-source projects (termed repos) are alternatives to proprietary software development. More than two-thirds of companies are contributing to open source. Open source technologies like OpenStack, Docker and KVM are being used to build the next generation of digital infrastructure. An iconic example of OSS is 'GitHub' - a successful social site. GitHub is a hosting platform that host repositories (repos) based on the Git version control system. GitHub is a knowledge-based workspace. It has several features that facilitate user communication and work integration. Through this thesis I employ data extracted from GitHub, and seek to better understand the OSS ecosystem, and to what extent each of its deployed elements affects the successful development of the OSS ecosystem. In addition, I investigate a repo's growth over different time periods to test the changing behavior of the repo. From our observations developers do not follow one development methodology when developing, and growing their project, and such developers tend to cherry-pick from differing available software methodologies. GitHub API remains the main OSS location engaged to extract the metadata for this thesis's research. This extraction process is time-consuming - due to restrictive access limitations (even with authentication). I apply Structure Equation Modelling (termed SEM) to investigate the relative path relationships between the GitHub- deployed OSS elements, and I determine the path strength contributions of each element to determine the OSS repo's activity level. SEM is a multivariate statistical analysis technique used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis. It is used to analyze the structural relationship between measured variables and/or latent constructs. This thesis bridges the research gap around longitude OSS studies. It engages large sample-size OSS repo metadata sets, data-quality control, and multiple programming language comparisons. Querying GitHub is not direct (nor simple) yet querying for all valid repos remains important - as sometimes illegal, or unrepresentative outlier repos (which may even be quite popular) do arise, and these then need to be removed from each initial OSS's language-specific metadata set. Eight top GitHub programming languages, (selected as the most forked repos) are separately engaged in this thesis's research. This thesis observes these eight metadata sets of GitHub repos. Over time, it measures the different repo contributions of the deployed elements of each metadata set. The number of stars-provided to the repo delivers a weaker contribution to its software development processes. Sometimes forks work against the repo's progress by generating very minor negative total effects into its commit (activity) level, and by sometimes diluting the focus of the repo's software development strategies. Here, a fork may generate new ideas, create a new repo, and then draw some original repo developers off into this new software development direction, thus retarding the original repo's commit (activity) level progression. Multiple intermittent and minor version releases exert lesser GitHub JavaScript repo commit (or activity) changes because they often involve only slight OSS improvements, and because they only require minimal commit/commits contributions. More commit(s) also bring more changes to documentation, and again the GitHub OSS repo's commit (activity) level rises. There are both direct and indirect drivers of the repo's OSS activity. Pulls and commits are the strongest drivers. This suggests creating higher levels of pull requests is likely a preferred prime target consideration for the repo creator's core team of developers. This study offers a big data direction for future work. It allows for the deployment of more sophisticated statistical comparison techniques. It offers further indications around the internal and broad relationships that likely exist between GitHub's OSS big data. Its data extraction ideas suggest a link through to business/consumer consumption, and possibly how these may be connected using improved repo search algorithms that release individual business value components

    GitHub: Factors Influencing Project Activity Levels

    Get PDF
    Open source software projects typically extend the capabilities of their software by incorporating code contributions from a diverse cross-section of developers. This GitHub structural path modelling study captures the current top 100 JavaScript projects in operation for at least one year or more. It draws on three theories (information integration, planned behavior, and social translucence) to help frame its comparative path approach, and to show ways to speed the collaborative development of GitHub OSS projects. It shows a project’s activity level increases with: (1) greater responder-group collaborative efforts, (2) increased numbers of major critical project version releases, and (3) the generation of further commits. However, the generation of additional forks negatively impacts overall project activity levels

    Analysing Big Data Projects Using Github and JavaScript Repositories

    Get PDF
    GitHub open source software developers remain in short supply. Successful GitHub projects offer multiple pathways for developers to contribute into their repositories. This study’s GitHub JavaScript big data is path modelled to provide understanding of the different significant developer contribution pathways towards raising the project’s activity level. Its significant pathways offer the project’s creator benchmark decision making capabilities that can be used to trigger faster project software development through to its next completion point. This approach has behavioural consumptive value connotations that may provide a future pathway towards tapping big data sources and to also delivering real business values

    A four stage approach towards speeding GitHub OSS development

    Get PDF
    Many open source software (OSS) project creators adopt GitHub as their chosen online repository. They seek out others within the global OSS community of developers. Such community developers are then encouraged to add their capabilities, ideas and coding into a creator’s developing OSS project. A structural equation modelling study of three top OSS programming languages deploys GitHub’s operational elements as a four stage directional suite of (1) dependent, (2) intermediaries, and (3) independent elements. It shows a project’s activity levels can be enhanced when additional project contributions are effectively stage-wise pursued. A staged development approach helps creators understand the process of attracting OSS developers into a creator’s GitHub project

    A preliminary exploration of the GitHub ecosystem: how to find important repositories

    No full text
    GitHub is arguably the most influential OSS version control system currently available. It is utilized by indie developers and large global companies. GitHub is an OSS developer service hub where a creator's raw project data can be accessed. However, with large numbers of developers, and significant numbers of projects hosted on GitHub, it remains difficult to extract useful knowledge from such projects since not all repositories are legal, accurate, or still active. In this paper, we suggest better ways to query GitHub and extracted useful repositories that highly related to search criteria. The paper also contains a trend analysis of the top JavaScript, Java, and Python GitHub repositories. Observations indicate there are attributes other than star and fork that could be useful for better query result when searching GitHub to direct result away from questionable repositories. The paper concludes by offering insights into future larger scale studies

    A four (4) stage approach towards speeding GitHub OSS development

    No full text
    Many open source software (OSS) project creators adopt GitHub as their chosen online repository. They seek out others within the global OSS community of developers. Such community developers are then encouraged to add their capabilities, ideas and coding into a creator's developing OSS project. A structural equation modelling study of three top OSS programming languages deploys GitHub's operational elements as a four stage directional suite of (1) dependent, (2) intermediaries, and (3) independent elements. It shows a project's activity levels can be enhanced when additional project contributions are effectively stage-wise pursued. A staged development approach helps creators understand the process of attracting OSS developers into a creator's GitHub project
    corecore